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We consider a two-stage procedure (TSP) for estimating an in- 
verse regression function at a given point, where isotonic regression 
is used at stage one to obtain an initial estimate and a local linear 
approximation in the vicinity of this estimate is used at stage two. We 
establish that the convergence rate of the second-stage estimate can 
attain the parametric n^^^ rate. Furthermore, a bootstrapped variant 
of TSP (BTSP) is introduced and its consistency properties studied. 
This variant manages to overcome the slow speed of the convergence 
in distribution and the estimation of the derivative of the regression 
function at the unknown target quantity. Finally, the finite sample 
performance of BTSP is studied through simulations and the method 
is illustrated on a data set. 



1. Introduction. The problem of estimating an inverse regression func- 
tion has a long history in Statistics, due to its importance in diverse areas 
including toxicology, drug development and engineering. The canonical for- 
mulation of the problem is as follows. Let 

Y = f{x) + e, 

where / is a monotone function establishing the relationship between the 
design variable x and the response Y, and e an error term with zero mean 
and finite variance cr^. Further, without loss of generality, it is assumed that 
/ is isotonic and x G [0, 1]. It is of interest to estimate do = f~^{6o) for some 
9o in the interior of the range of /, given f'{do) > 0. 
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One Stage Data 
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Fig. 1. The average delay as a function of system's 



Depending on the nature of the problem, one usually first obtains an 
estimate of / and subsequently of do , either from observational data or from 
design studies [25]. In the latter case, one specifies a number of values for 
the design variable, and obtains the corresponding responses, which are then 
used to get the estimates. 

Motivated by an engineering application, fully described in Section 5, 
we introduce a two-stage design for estimating cIq. Specifically, we consider 
a complex queueing system operating in discrete time under a through- 
put (average number of customers processed per unit of time) maximizing 
resource allocation policy (for details, see Bambos and Michailidis [2]). Un- 
fortunately the customers' average delay, which is an important "quality-of- 
service" metric of the performance of the system, is not analytically tractable 
and can only be obtained via expensive simulations. The average delay as 
a function of the system's loading (number of customers arriving per unit 
of time) is depicted in Figure 1. The relationship between system loading 
and average delay cannot be easily captured by a simple parametric model; 
hence, a nonparametric estimator might be more useful. In addition, given 
that the responses are obtained through simulation, only a relatively small 
number of simulation runs can be performed. It is of great interest for the 
system's operator to obtain accurate estimates of the loading corresponding 
to prespecified delay thresholds (e.g., 10 and 15 time units), so as to be able 
to decide whether to upgrade the available resources. 

The main idea of the proposed two-stage approach is summarized next: at 
stage one, an initial set of design points and their corresponding responses 
are generated. Then a first-stage nonparametric estimate of / is obtained 
and subsequently a first-stage estimate of do. Next, a second-stage sampling 
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interval covering with high probabihty is specified and all new design 
points are laid down at the two boundary points of this interval and their 
responses obtained. Finally, a linear regression model is fitted to the second- 
stage data by least squares and a second-stage estimate of do computed 
as the inverse of the locally approximating line of / at ^o- As we will see, 
the employment of a local linear approximation at stage two allows the 
second-stage estimate of do to attain a ^/n parametric rate of convergence, 
despite the nonparametric nature of the problem. To overcome estimation 
of several tuning parameters required by the second-stage estimate, a boot- 
strapped variant is introduced and its consistency properties established. 
To clinch the asymptotic results of the proposed two-stage estimate and its 
bootstrapped counterpart, a number of subtle technical issues need to be 
addressed and these are resolved in subsequent sections. Before proceeding 
further, it is important to draw attention to the fact that our proposed two- 
stage method relies critically on the reproducibility of the experiment: that 
is, at any stage, it is possible to sample responses from any pre-specified 
covariate value. While reproducibility in this sense is guaranteed for our 
motivating application, the two-stage procedure above is not applicable in 
the absence of adequate degree of control on the covariate. For example, if 
the covariate is time, the implementation of a two-stage procedure would 
require one to go back and sample from the past, which is impossible. 

Isotonic regression is a conceptually natural and computationally efficient 
estimation method for shape-restricted problems [6, 31]. In the framework 
of regression, the asymptotic distribution for the isotonic regression esti- 
mator at a fixed point was first derived in Brunk [8], and then extended 
in Wright [37] and Leurgans [21]. The asymptotic distribution for the Li- 
distance between the isotonic estimator and the regression function was 
obtained in Durot [9] , paralleling Groeneboom, Hooghiemstra and Lopuhaa 
[15] on a unimodal density, and then extended in Durot [10, 11]. Banerjee 
and Wellner [5] derive the asymptotic distribution for the inverse of the 
distribution function of the survival time at a given point in the current 
status model; the regression version of this result will be used to derive the 
asymptotics for the two stage procedures. 

The inverse regression problem has been extensively studied in the context 
of different applications. For example, in statistical calibration, the goal is to 
estimate a scalar quantity do from a model Z = f{do) + e, with Z observed. 
The information about the underlying function / comes from experimental 
data {Yi,Xi} that follow the same regression model; namely, Yi = f{Xi) + 
Ei. Osborne [28] gives a comprehensive review of this topic and Gruet [17] 
provides a kernel based direct nonparametric estimator of do. It is clear 
that when e = 0, the calibration problem becomes the canonical problem 
described above. 
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Another active area is provided by the model-based dose-finding prob- 
lems in toxicology and drug development, where do corresponds to either 
the maximal tolerated dose or the effective dose with respect to a given 
maximal toxicity or an efficacy level. Possible dose levels are often prespeci- 
fied. The dose-response relationship is usually assumed to be monotone and 
described either by parametric models (e.g., probit, logit [25], multihit [29], 
cubic logistic [24]), or by nonpar ametric models, for which kernel estimates 
[35] and isotonic regression [36] are employed. On the other hand, due to eth- 
ical and budget considerations, most studies encompass sequential designs, 
so that more subjects (e.g., patients) receive doses close to the target do (see 
Rosenberger [32] and Rosenberger and Haines [33] for comprehensive reviews 
on the subject). Stylianou and Flournoy [36] compare parametric estimators 
using maximum likelihood and weighted least squares based on the logit 
model and nonparametric ones using sample mean and isotonic regression 
with a sequential up-and-down biased coin design, and show that a linearly 
interpolated isotonic regression estimator performs best in most simulated 
scenarios. Further, Ivanova et al. [19] claim that the isotonic regression esti- 
mator still performs best for small to moderate sample sizes with several se- 
quential designs from a family of up-and-down designs; Gezmu and Flournoy 
[14] recommend using smoothed isotonic regression with their group up-and 
down designs. All these partially motivate the use of isotonic regression in 
our two-stage procedure, though it should be noted that our approach is 
markedly different from the ones discussed above, owing to the different 
nature of the motivating application; in particular, ethical constraints that 
prevent administration of high dose-levels are absent in our situation. 

In a nonparametric setting, one could also employ a fully sequential 
Robbins-Monro procedure [30] for finding do. This corresponds to a stochas- 
tic version of Newton's scheme for root finding problems. Anbar [1] con- 
sidered a modified Robbins-Monro type procedure approximating the root 
from one side. A good review of this area is provided in Lai [20], in which 
it is also pointed out that the procedure usually exhibits an "unsatisfactory 
finite-sample performance except for linear problems" especially in noisy set- 
tings, due to the fact that it does not incorporate modeling for (re) using all 
the available — up to that instance — data. Another downside of a sequential 
design, as opposed to the batch design employed in this study, is the time 
and effort required to collect the data points [26]. 

The remainder of the paper is organized as follows: Section 2 describes the 
two-stage procedures. The asymptotic properties of the two-stage estimators 
are obtained in Section 3. Simulation studies and data analysis are presented 
in Sections 4 and 5, respectively. We close with a discussion in Section 6, 
which is followed by an Appendix containing technical details. 
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2. Two-stage procedures. In this section, we review some necessary back- 
ground material and introduce the proposed two-stage estimation proce- 
dures. 

2.1. Preliminaries: A single-stage procedure. We review some material 
on estimating the parameter of interest do by using isotonic regression com- 
bined with a single-stage design. The procedure is outlined next: 

1. Choose n increasing design points {a;^}"^^ G [0, 1] and obtain the corre- 
sponding responses that are generated according to Yin = f {xin) + ^im i = 
1,2, ... ,n, where / is in ,^q, a class of increasing real functions on [0, 1] 
with positive and continuous first derivatives in a neighborhood of do 
and Ein are independently and identically distributed (i.i.d.) random er- 
rors with mean zero and constant variance cj^. Note that the subscript n 
will be suppressed from now on for simplicity of notation. 

2. Obtain the isotonic regression estimate / of / from the data {(a^j, . 
(For details, see. Chapter 1 of Robertson, Wright and Dykstra [31].) 

3. Estimate do by d'h^ = /"H^'o) = inf{2; G [0,1]: f{x) > 9o}, where = 
f{do). 

In order to study the properties of / and dn \ we consider the following 
further assumption on the design points. 

(Al) There exists a distribution function G, whose Lebesgue density g is 
positive at do, such that sup^-g^ — G{x)\ = o{n~^^^), where F„ 

is the empirical function of {xj}"^]^. 

For example, the discrete uniform design Xi =i/n for i = 1,2, . . . ,n satisfies 
(Al) with G being the uniform distribution on [0, 1] and g{do) = 1 > 0. 

The following basic result provides the asymptotic distribution of dn^ . 

Theorem 2.1. ///g^q and (Al) holds, 

ni/3(da)_do)4cZ, 
where C = [Aa"^ / {f {doY g{do))Y^^ and 'L follows Chernoff's distribution. 

Remark 2.1. Chernoff's distribution is the distribution of the almost 
sure unique maximizer of B(t) — t^ on R, where B{t) denotes a two-sided 
standard Brownian motion starting at the origin [-B(O) =0]. It is symmetric 
around zero, with tails dwindling faster than those of the Gaussian and its 
quantiles have been tabled in Groeneboom and Wellner [16]. 

The proof of Theorem 2.1 follows by adaptations of the arguments from 
Theorem 1 in Banerjee and Wellner [5] to the current regression setting. 
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Hence, an approximate confidence interval for do with significance level 1 — 
2a can be constructed as follows: 

(2.1) [JW - + n-^/^Cqa] n (0, 1), 

where qa denotes the upper a quantile of Chernoff 's distribution and C is 
a consistent estimate of C. 

In the presence of relatively small budgets for design points, the slow 
convergence rate and the need to estimate f'{do) adversely impact the per- 
formance of this procedure. In order to accelerate the convergence rate, we 
propose next an alternative that is based on a two-stage sampling design 
and uses local linear approximation for / in stage two. 

2.2. Procedures based on two-stage sampling designs. We describe next 
a hybrid estimation procedure for estimating do based on a two-stage sam- 
pling design. Suppose that the total budget consists of n doses that are going 
to be allocated in two stages. 

1. Allocate ni = np,p G (0, 1) design points and obtain the first-stage data 

{{xi,Yi)}^^^, the isotonic regression estimate of / and the estimate dni 
of do as outlined in Section 2.1. Note that by np, we denote by [np\ or 
[np\ + 1, depending on whether n — [np\ is even or not. Also, recall that 
the additional subscript n is suppressed. 

2. Determine two second-stage sampling points L and U symmetrically 
around dni , where L = dni — Kn^^ and U = dni + Kn^^ , for some con- 
stants 7 > and K >0. 

3. Allocate the remaining n — ni design points equally to L and U and 
generate the responses as = f{L) + e'^ and = f{U) + e'- for i = 
1,2, .. . ,71-2, with {e'^} and {ef} being i.i.d. random errors with mean zero 
and constant variance <t^, mutually independent and also independent 
of {ej. 

4. Fit the second-stage data {{L,Y/),{U,Y")} with the linear model y = 
/5o + l^ix using least squares. Denote the resulting intercept and slope 
estimates by {/3o,l3i), respectively. Then, the second-stage (or two-stage) 
estimator of do is given hy dn = {6q — /3o)//3i. 

~(2) 

Asymptotic properties of dh will be established in Section 3.1. For ex- 
ample, when / is in a subset of ,^o, denoted as the third derivatives of 
whose elements are uniformly bounded around do, and 7 G (1/4,1/3), we 
have 

(2.2) "'^'(■'l. '-"0)^ 

where -4 denotes convergence in distribution. Thus, the convergence rate 
of the two-stage estimator of do becomes n^/^, the standard parametric 
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convergence rate, which is faster than the n^/^ convergence rate of the one- 
stage isotonic regression estimator. 

However, when constructing confidence intervals from asymptotic results 
like (2.2), we face two difficulties. One is that the limiting distributions 
of interest still depend on /'(do), accurate estimation of which is difficult 
for small to moderate sample sizes. The other one, which is less obvious 
but perhaps with more serious practical implications, is that the asymptotic 
results of interest suffer slow speed of convergence in distribution. Therefore, 
a bootstrap variant of the two-stage procedure that avoids direct estimation 
of /'(do) is introduced and is seen to relieve the slow convergence problem. 

2.3. Bootstrapping the two-stage estimator. The steps of the bootstrapped 
two-stage procedure are outlined next. 

1. Follow steps 1-4 to obtain the second stage design points L and U, re- 

spouses {Y-} and {Y-'} and dn . 

2. Sample with replacement, responses {Y/*}^^^ and from {Y-}^^^ 
and respectively. Construct the corresponding bootstrapped 

second-stage (or two-stage) estimator dn , and calculate the correspond- 
ing root R*^ = n^^id^n^* - d^n^). 

3. Repeat the previous step B times to obtain {Rr!'}i^=i- Subsequently, cal- 
culate the lower and upper a quantiles, qf and q*, of {R^}j^:^i- Finally, 
construct a 1 — 2a bootstrapped Wald-type confidence interval for do as 

(2.3) [dl^)-n-V2,^,dl2)-n-VV]. 

Note that the procedure does not require estimation of /'(do). 

The asymptotic properties of the bootstrapped two-stage estimator are 
established in Section 3.2. For example, when / G 7 S (0,1/3) and all 
the absolute moments of the random error are finite, we have 

(2.4) nV2(J(^K _ ^4 jj^^-^^—^NiO, 1), (P-a.s.), 

where implies convergence in distribution conditional on the data ob- 
tained from the employed two-stage design. 

From (2.2) and (2.4), the strong consistency of the bootstrapped esti- 

mator dn is ensured for / G ^ and 7 S (1/4,1/3). In fact, the strong 
assumption on the random error can be replaced by a mild one that the 
sixth moment of the random error is finite, at the price of replacing strong 
consistency with weak consistency. Therefore, the bootstrapped procedure 
is theoretically validated under certain conditions. 

Remark 2.2. Both the two-stage estimator and its bootstrapped vari- 
ant rely on the choice of a number of tuning parameters: p, 7 and K. Prac- 
tical procedures for their selection will be discussed in Section 4. 
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3. Asymptotic properties of two-stage estimators. In this section, we 
establisii the asymptotic properties of both the two-stage estimator and its 
bootstrapped variant for do. We start by discussing the two-stage estima- 
tor dn\ 



3.1. Two-stage estimator. All results in this subsection are derived under 
the assumption (Al). According to the two-stage procedure, 

TL'Z 

00 Ji) = argminX;[(i;' " /3o - f^iLf + (F/' - /3o - AC/)^]. 

/3o,/3i6R ^ 

Denote y+ = F/' + Y/ and Yr = Y/' - F/. Then, 

(3.1) ^o = (2n2)-^52y+-dlVA, ^i = {2Kn-^n2r'J2^-. 

1=1 i=l 
" ~(2) 

Setting 6*0 = /3o + A^n gives 

(3.2) d^) = {i/^,)(eo-M = a/^) 

~(2) 

In order to analyze dn , additional assumptions about the smoothness of 
the underlying function / around do are required. We consider the following 
three classes of underlying functions: 

^ = {/G^o:/"'(x) is UBN(do)}, 

=^1 = {/ G ^0 : f'ido) + 0, /"'(x) is UBN((io)}, 

^2 = {/ G ^0 : r\do) = 0, /'"(do) / 0, /W(x) is UBN(do)}, 

where UBN((io) means "uniformly bounded in a neighborhood of do." Then, 
the mutually exclusive and ,^2 are subsets of ^ . 



(2n2)-i£y,^ 



Remark 3.1. A function in ^2 is exactly locally linear at do while that 
in ^1 is not. Notice that both ^2 and depend on do. For example, con- 
sider the sigmoid function f{x) = exp{a(x — 5)}/(l + exp{a(3; — 6)}) for some 
constants a > and h G (0, 1). It belongs to ,^2 if do = 6 and to ^1 other- 
wise. Obviously, the size of ^2 is much smaller than that of ^i. However, 
the asymptotic results for / G ^2 should also provide good approximations 
for functions that are approximately linear in the vicinity of do- Hence, the 
class ^2 is also of interest. 

We consider next the asymptotic properties of dn , starting with the 
consistency of the two-stage estimator. 
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Lemma 3.1. For / G ^ and 7 G (0, 1/2), we have 

/3o 4 /(do) - /'(do)do, /3i^m) and S^^^do. 

~(2) 

Based on Lemma 3.1, we obtain the asymptotic distribution of dn in 
the next theorem. It turns out that the asymptotic results with / S and 
^2 are the same for 7 > 1 /4. This imphes that the nonhnearity of / at do 
becomes asymptotically ignorable as the length of the neighborhood of do 
shrinks fast enough. 

Theorem 3.2. For f e ^ and 7 G (1/4, 1/2), 

nV2(J(2) _ 4 C2Z1 for 7 G (1/4, 1/3), 
n^/2(dl2) - do) 4 C2Z1 + C3ZZ2 for 7 = 1/3, 
^(5/6-7) (J(2) _ 4 CsZZ2 for 7 G (1/3, 1/2); 
for f£^i andjG (0,1/4], 

n2T(dl2) - do) 4 Ci /or 7 G (0,1/4), 
ni/2(dl2)_do)4Ci + C2Zi /or 7 = 1/4; 
for f£^2 and 7 G (1/8,1/4], 

nV2(dl2) _ 4 c^z, for 7 G (1/8, 1/4]; 

where Ci = VV"(4)/[2/'(do)], C2 = <T/[/'(do)(l -p)!/^]^ ^3 = ^7^2/ 
iC, C is as given in Theorem 2.1, Z\ and Z2 are standard normal, Z follows 
Chernoff's distribution and Z, Zi,Z2 are mutually independent. 

Remark 3.2. Theorem 3.2 characterizes the convergence rate of the es- 
timator in terms of the size of the shrinking neighborhood. It shows that 
for 7 G [1/4, 1/3] the parametric rate of n^/^ is achieved given / G On 
the other hand, for the boundary values of 7 = 1/4 and 1/3, there exists 
asymptotic bias in the former case (for / G ^1), while in the latter case the 
asymptotic variance increases. For 7 > 1/3, the rate of convergence falls be- 
low ^/n, while for 7 < 1/4 and / G ^1 the limit distribution of the two-stage 
estimate is degenerate and thus not conducive to inference. Hence, these 
results suggest selecting 7 in the (1/4,1/3) range. Note that, the function 
class ^2 achieves the n^/^ rate of convergence for a slightly larger range 
of values for 7 than This is a consequence of the near linearity of / 
in the vicinity of do, which allows a good linear approximation of / with a 
relatively long interval [L,U]. 
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Remark 3.3. The case of 7 < 1/8 has been omitted for / G since it 
involves a Taylor expansion of / up to its fifth derivative. Nevertheless, in 
principle no other technical challenges are in play. 



3.2. Bootstrapped two-stage estimator. We consider next the asymptotic 
properties of the bootstrapped two-stage estimator, which is 



(3.3) J(2)'^ = (l/^^)(0o-^o^) = (l//3t; 



n2 



i=l 



where 



V"* -I- V* V* 



Y!* and 



(3.4) /3o* = (2n2)-^X;i^*+-dlV/3^, P{ = {2Kn~,^n,)-'Y.Yl 



We now present a probabilistic framework needed to clearly establish 
the asymptotic properties of the bootstrapped estimator rigorously. The 
point is that the design points and random errors involved in the sampling 
mechanism are assumed to come from triangular arrays but not necessarily 
from sequences. 

Let {{xm}"^]^}^^]^ be a triangluar array of distinct design points in [0, 1] 
and e a continuous random variable in R with mean and finite vari- 
ance (T^ > 0. Now, there exists, on some probability space {Q.,-s^ ,P), a set 
of random errors {{emlf^^^, which are i.i.d. copies 
of e. Then, suppressing the subscript n, {ej(a;)}"J^-^, {e^(a;)}"^-^, 

{£'l{uj)}^l]}'^^i represents a fixed triangular array of real numbers for a fixed 
w E rj, where n = ni + 2n2 with ni and 2n2 denoting the first and second 
stage sample sizes. 

Given cj S fi, according to the sampling mechanism used in the boot- 
strapped procedure, the data obtained from the first stage are given by 
{{xi,Yi{!jj))}^^-^, which are subsequently used to obtain dn}{(jj) and the lower 
and upper boundary points L{uj) and U{oj) to be used in the second stage. 
Hence, the second-stage data are given by {L(w), y/(a;)} and {U {u) ,Y!' {uj)} 

and the resulting estimate by dn {oj). The procedure then requires boot- 
strapping {yj'(w)}"^j^ and {y/'('^)}"=i, which is conceptually equivalent to 
bootstrapping {£'i{u)}Zi and {ef(a;)}gi to get {e't]Zi and {eflgi, so 
that y/* = f{L{u)) + e'* and y/'* = f{U{uj)) + e'l* for i = 1, 2, . . . , n2. Note 
that given u and n, the bootstrapped second-stage random errors {s'i^^i 
and {^nS 

^ are i.i.d. uniform random variables supported on {e^(a;)}^J^ 

and {^^'(w)}"^^, respectively. Finally, the bootstrapped estimate dn is cal- 
culated from {(L(w),y/*),(C/(a;),y/'*)}^ip 

Thus, given oj and with n increasing, the design points and random errors 
are sampled as rows from the fixed triangular array. Then the bootstrapped 
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random errors and {e'-*}^!^ also form triangular arrays as n varies. 

Given uo and n, the randomness of dii comes from the bootstrapping step. 

Under the above theoretical setting, in order to obtain the strong con- 
sistency of the bootstrapped estimator, we consider the following strong 
assumptions on the design points, the regression function and the random 
errors. 

(A2) There exists a distribution function G, whose Lebesgue density g 
is positive and has a bounded first derivative on [0,1], such that 
supjjgjo,!] iF'nix) — G{x)\ < n~^/^, where Fn is the empirical function of 
{xi}^^^ and "<" denotes that the left-hand side is less than a constant 
times the right-hand side. 

(A3) The regression function / G is differentiable on [0, 1] with 
ii^fxG[o,i] f'{x) and sup^g[o,i] /'(^) both positive and finite. 

(A4) All the absolute moments of e are finite, that is, E|e|'' < oo for all 

Remark 3.4. There exist triangular arrays of design points satisfy- 
ing (A2). For example, with Xi = i/n for i = 1, 2, . . . , n and all n, we have an 
array of discrete uniform designs on [0, 1] . Let G be the uniform distribution 
function on [0, 1]. Then, for this special array sup^.g[o,i] \Fnix) — G{x)\ < 1/n. 
Note that (A2) is stronger than (Al). A random variable with finite moment 
generating function in a small neighborhood of satisfies (A4), such as a nor- 
mal random variable. The assumptions (A2) to (A4) are essentially the fixed 
design versions of the assumptions for Lemma 1 of Durot [11], a modification 
of which enables us to identify a crucial boundary rate for the almost sure 
convergence of the isotonic regression estimator of do. This boundary rate 
plays a central role in the strong consistency of the bootstrapped estimator. 

Next, we state results on the strong consistency of /3i and the conditional 
weak consistency of /S^ and then on strong consistency of the bootstrapped 
estimator. Note that P* denotes the probability of the bootstrapped data 
conditional on the original data. 

Lemma 3.3. Iffe^,-fe (0,1/2) and (A2) to (A4) hold, 

A^/'(4), (P-a.s.), ^t^f'ido), {P-a.s.), 

p* 

where — ?■ denotes convergence in probability conditional on a given u. 
Theorem 3.4. If f e^, -/(^ (0, 1/3) and (A2) to (A4) hold, 



nV2(J(2>_J(2))^f;c2Zi, {P-a.s.), 
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where C2 and Z\ are as in Theorem 3.2. That is, 

sup\P*{n^/\d'^^* - dl^)) <t)- P{C2Zi < t)\ "4- 0. 

From the above strong consistency, the corresponding weak consistency 
follows under the same set of conditions. However, weak consistency can be 
obtained with the following weaker requirement on the random error: 

(A5) The sixth moment of e is finite, that is, Ee^ < 00. 

Theorem 3.5. /// G ^, 7 G (0, 1/3) and (Al) and (A5) hold, forte R, 
sup|P*(ni/2(JW* _ J(2)) <t)- P{C2Zi < t)| 4 0, 

where C2 and Z\ are as in Theorem 3.2. 

Remark 3.5. Comparing Theorem 3.4 with Theorem 3.2, we see that, 
under the strong assumption (A5) on the random errors, the bootstrapped 
estimator is strongly consistent for f G ^ and 7 G (1/4, 1/3), which is ex- 
actly the 7-range of most interest. Further, if / is locally linear at do, that 
is, / G ^2, the strong consistency continues to hold for 7 G (1/8, 1/4]. Sim- 
ilar conclusions on weak consistency hold by comparing Theorem 3.5 with 
Theorem 3.2, but under the milder assumption (A5) on the random errors. 

4. Performance evaluation. In this section, through an extensive simu- 
lation study we investigate the finite sample performance of the one-stage 
procedure (henceforth, OSP), the proposed two-stage procedure (TSP) and 
its bootstrapped variant (BTSP). 

Notice that for practically implementing the OSP, as well as the two-stage 
procedures, estimates of f'{do) and cr^ need to be obtained; the resulting 
procedures are called POSP, PTSP and PBTSP, respectively (Practical OSP, 
TSP and BTSP). For u^, we employ the nonparametric estimator proposed 
by Gasser, Sroka and Jennen-Steinmetz [13], which is based on local linear 
fitting. Suppose the data {{xi,Yi)}^^^ are already sorted in ascending order 
of Xi's. Then, we calculate 

n-l 
1=2 

where = aiYi_i + biYi+i-Yi, cj = {a} + hf + , = (xj+i - Xi)/(xj+i - 
Xi-i) and hi = (xj — Xi-i) / {xij^i — Xi_i), for i = 2, 3, . . . , n — 1. An estimate of 
f'{do) is obtained through the local quadratic regression estimator proposed 
by Fan and Gijbels [12], at the estimate dn^ . Specifically, let K{-) denote 
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the Epanechnikov kernel function and h> the bandwidth, so that Kh(- 
{\/h)K{- /h). Further, let r) = (r/o, 572) be given by 

2 



fj = arg min > 



2 



j=0 



Then, the local quadratic regression estimator of f'{dn^) is given by 571. The 
bandwidth h is chosen by first fitting a fifth order polynomial function to the 
data to obtain fpo\{x) = Yl^=o^j^'^ ■ ^^xt, the estimate of the third order 

derivative of / at is obtained by f^^^ {Sn^ ) = 603 + 24a4(il^^ + GOos (Sn^)^. 
Finally, the bandwidth h is calculated as 

where Ci,2(i^) = 2.275. 

For the two-stage procedures, the tuning parameters 7 and K need to 
be specified for obtaining the second-stage sampling points L and U. We 
select them as the end points of a high level Wald-type confidence interval 
calculated from the first-stage data; that is, 7 and K satisfy 

(4.1) Kn^^ = Cq0n~'/^ 

where qp is the upper /3 quantile of Z. On the other hand, a good quanti- 
tative rule for selecting the first-stage sample proportion p is not available; 
nevertheless, a practical qualitative rule of thumb dictates that p should 
decrease, while np should increase as the sample size increases. In our sim- 
ulation study, a number of different values for p are considered. 

Finally, due to presence of small sample sizes the following modification 
of the second-stage estimator is adopted: 

j(2) ^ rmin(max((0o -/3o)//3i,0),l), if A > 0, 

" I dnj , otherwise. 

The same modification applies to the bootstrapped second-stage estimator 
in BTSP. 

Remark 4.1. Note that our method for choosing the tuning parameters 
'y,K brings in another subjective parameter /3. However, the choice of /3 
is guided by a rational principle, namely the requirement that the chosen 
interval contain the truth with high probability. The magnitude of f3 is 
related to how conservative the experimenter wants to be in the construction 
of the second stage interval which is fundamentally subjective. Also, our 
rule of thumb regarding the choice of p is based on the idea that with larger 
budgets smaller p's at stage one will still lead to reasonably precise sampling 
intervals at stage two, leaving a larger proportion of points for stage two and 
the possibility of more accurate conclusions. 
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The basic settings of the simulation study are as follows: two regres- 
sion functions are considered, /i(x) = + x/5 and f2{x) = e'^^^~^-^^ / {1 + 
g4(x-o.5)^ for X S [0,1]. The first-stage design points are drawn from a dis- 
crete uniform distribution on [0,1], that is, Xj = i/{ni + 1). Further, the 
target is set to do = 0.5, the standard deviation of the random error a to 
0.1, 0.3 and 0.5, the sample size n ranges from 50 to 500 in increments of 
50, while the first-stage sample proportion p ranges from 0.2 to 0.8 in in- 
crements of 0.1. Finally, the levels of significance a and /? are set to 0.025. 
Note that /3 is only required to be small and the specific choice of 0.025 is 
somewhat arbitrary. The following quantities are computed: coverage rates 
and average lengths of confidence intervals, and mean squared errors of esti- 
mators. The simulation programs and more results can be found on the first 
author's webpage: www.stat.lsa.umich.edu/~rltang. In this paper, we show 
part of the results for saving space. 

Remark 4.2. Choosing 7 and K via equation (4.1) is theoretically 
equivalent to having 7 = 1/3 and K = Cqp. Notice that strictly speaking, 
neither strong nor weak consistency for 7 = 1/3 is expected to hold for the 
bootstrapped estimator. However, it is reasonable to expect that for realistic 
sample sizes, the performance of the bootstrap would be satisfactory, since 
7 = 1/3 is at the boundary of consistency. The obtained simulation results 
certainly vindicate this expectation. We would like to note that there are 
other bootstrap methods that could have been used, like the wild or resid- 
ual bootstrap or the m out of n bootstrap, but it is not clear whether they 
would yield consistency at 7 = 1/3. It would be interesting to explore some 
of these issues in future work. 

4.1. Comparison of two- stage procedures. By Theorem 2.1, from the first- 
stage data, an asymptotic (1 — 2/3) confidence interval for dp with the true 
parameter is given by 

[JW - Cg^n7^/^ + Cqpn~''^] n [0, 1]. 

We consider the above confidence interval as the sampling interval [i,C/] 
with 7 = 1/3 and K = Cqp. Then, by Theorem 3.2, for / € ^ and 7 = 1/3, 

nl/2(42) _^^^^C2Zi+ C3ZZ2. 

Hence, the corresponding asymptotic (1 — 2a) confidence interval of do is 
given by 

(4.2) [^2) - q^n-^/\ 4f) + q^n-^^] n [0, 1], 

where ija is the upper a quantile of C2Z1 + C3ZZ2. 
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Coverage Rates of TSP, BTSP and PBTSP 
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0.20.30.40.50.60.70 



0.20.30.40.50.60.70.8 



sigma=0.1 o sigma=0.3 + - - ■ 

Fig. 2. Coverage rate plot grouped with different a's. 



Next, we compare the two-stage procedures, focusing on the coverage 
rates. In the first row of Figure 2, the coverage rates of the (4.2) confidence 
intervals for combinations of /, n and a are shown based on 5000 rephcations, 
using the true parameters /'{do) and a (i.e., the true C, C2 and C3 in con- 
structing the confidence intervals). It can be seen that in general, coverage 
rates are below the nominal level 0.95, which is depicted by a solid horizontal 

~(2) 

line in each subplot. This reflects that dn usually has slow speed of conver- 
gence in distribution. As expected, the results improve for small noise levels, 
larger sample sizes and functions closer to linearity in the vicinity of do- 

The second row in Figure 2 shows the coverage rates of the bootstrapped 
procedure, based on 1000 replicates and 3000 bootstrap samples per repli- 
cate, using the true parameters /'(do) and a at stage one. It can be seen that 
the coverage rates achieve the nominal level with proper first-stage sample 
proportions, smaller values of which are preferred since both average lengths 
and mean square errors are usually increasing with p from simulation results 
not shown in this paper. It can be concluded that the BTSP exhibits superior 
performance to the TSP in terms of coverage rates, especially for settings 
with /i, moderate noise and relatively small sample sizes. 
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Finally, the third row in Figure 2 depicts the coverage rates of the boot- 
strapped procedure, when both f'{do) and a are estimated from the first- 
stage data. The results based on 1000 replicates and 3000 bootstrap samples 
per replicate indicate a high level of agreement with those of the BTSP, 
which in turn suggests that the PBTSP is reliable in applications. 

Our findings suggest that p = 0.4 is a good conservative choice for func- 
tions exhibiting a strong linear trend in the vicinity of do, while p = 0.5 is 
preferable otherwise. 

4.2. Comparison of one- and two-stage procedures. We compare next the 
POSP and the PBTSP, in terms of coverage rates and average lengths of 
confidence intervals. We also compare the mean squared errors of the first- 
and second-stage estimates of do- The results for POSP are based on 5000 
replications, while those of PBTSP on 1000 replications and 3000 bootstrap 
samples per replication, due to its computational intensity. It can be seen 
from Table 1 that both procedures usually perform well in terms of coverage 
rates. Further, under the PBTSP, confidence intervals usually have shorter 
average lengths, and the estimates for do have smaller mean squared errors, 
with slightly more gains accruing in the /2 case. However, it needs to be 
pointed out that both procedures suffer in the case with large noise and 
small to moderate sample sizes, especially for /i. 

Remark 4.3. One of the advantages of the bootstrap procedure, as poin- 
ted out in Section 2.3, is that its implementation does not require knowledge 

Table 1 

CR, AL and MSE stand for coverage rates, average lengths and mean 
squared errors of PBTSP while CRl, All and MSEl stand for those of POSP. 
ALR and MSER are the ratios of CR over CRl and MSE over MSEl, respectively 



f 


P 


a 


n 


CR 


CRl 


AL 


ALl 


ALR 


MSE 


MSEl 


MSER 


fl 


0.5 


0.1 


100 


0.944 


0.955 


0.06 


0.13 


0.43 


2e-04 


le-03 


0.21 








200 


0.943 


0.953 


0.04 


0.10 


0.37 


le-04 


7e-04 


0.15 








300 


0.952 


0.956 


0.03 


0.09 


0.35 


7e-05 


5e-04 


0.14 






0.3 
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0.927 


0.942 


0.21 


0.27 


0.79 


3e-03 


5e-03 


0.58 








200 


0.935 


0.947 


0.14 


0.21 


0.63 


le-03 


3e-03 


0.39 








300 


0.956 


0.947 


0.11 


0.19 


0.58 


8e-04 


2e-03 


0.33 




0.4 


0.1 


100 


0.971 


0.966 


0.06 


0.16 


0.40 


2e-04 


le-03 


0.16 
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0.951 


0.964 


0.04 


0.12 


0.34 


le-04 


9e-04 


0.13 








300 


0.950 


0.966 


0.03 


0.11 


0.31 


7e-05 


7e-04 


0.11 






0.3 
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0.952 


0.948 


0.24 


0.32 


0.76 


5e-03 


6e-03 


0.79 








200 


0.959 


0.956 


0.16 


0.25 


0.62 


2e-03 


4e-03 


0.46 








300 


0.948 


0.955 


0.12 


0.22 


0.53 


8e-04 


3e-03 


0.25 
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of /'(do). One might feel that the practical implementation of the bootstrap 
procedure defeats this advantage, since f'{do) is estimated from the first- 
stage data to construct the second stage sampling interval. However, note 
that only a rough and ready estimate of f'{do) would suffice for the purpose 
of setting the sampling interval. On the contrary, to set a confidence inter- 
val directly from the asymptotic distribution of the second-stage estimate 
requires a much more precise estimate of /'(do). Thus, the really crucial ad- 
vantage with the bootstrap is that it obviates the need for a precise estimate 
of /'(do). 

Remark 4.4. Notice that the sigmoid function /2 belongs to class ^2 
for the case do = 0.5, since its second-derivative vanishes at that point. It 
is of practical interest to investigate the performance of the PBTSP for the 
case where the regression function at the target point is close to, but not 
exactly, linear. We have examined the case for /2 and do = 0.4 and 0.6 under 
the previously considered settings. The curvatures (i.e., second derivatives) 
of the regression functions at these two points are about 0.76 and —0.76, 
respectively. The results are very close to those obtained for do = 0.5. 

Remark 4.5. In PBTSP, the second stage sampling points L and U are 
identified through a Wald-type confidence interval constructed via estimat- 
ing /'{do) and a^, with dn^ at the center of [-Z>,f/]. An alternative, albeit 
ad-hoc way of obtaining an interval centered at dni is to set L = dn^ — Ln/2 

and U = dn^ + Ln/2, where L„ is the length of a testing-based confidence 
interval for do obtained from the first-stage data. This testing-based interval 
is obtained as follows: consider testing the hypothesis i^o,d ^ /~^(^o) =d vs. 
Hi^d ■ /~^(^o) 7^ d. Let /^^^ denote the usual isotonic estimator of / from the 
stage one data and the constrained isotonic estimator under Hq ,^. The 
residual sum of squares based test statistic is given by 

where 0"^ is a consistent estimate of 0"^. The inversion procedure assigns d 
to the confidence set if RSS((i) falls below an appropriate threshold deter- 
mined by a pre-specified quantile of its limit distribution D (when d = do 
holds true), which is completely parameter- free and therefore enables the 
construction of the confidence set without the need for nuisance parameter 
estimation. The limit distribution of RSS(d'') can be derived by adapting 
Theorem 2 of [5] (where a likelihood ratio statistic is dealt with) to the resid- 
ual sum of squares statistic in the nonparametric regression setting, but see 
also [3] and [4] for a unified treatment of likelihood ratio and residual sum 
of squares statistics in monotone function problems. 
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Alternatively, we can use the extremities of the testing-based confidence 
interval themselves as the sampling points for the second stage. For both 
cases, simulations show that their results are very similar to those of PBTSP 
using the Wald-type confidence interval, thus implying that the procedure 
is not particularly sensitive to the exact specification of L and U . Note that 
although this testing-based approach has the merit of completely avoiding 
the estimation of f'{do), the asymptotic properties of the corresponding 
two-stage estimator and its bootstrapped variant become intractable since 
neither the testing-based confidence interval nor the length L„ admits an 
easy analytical characterization, unlike the analytically simple Wald-type 
confidence intervals used in this paper. To conform to the theoretical de- 
velopment and to save space, we only present simulation results for such 
Wald-type stage two sampling intervals. 

Remark 4.6. In the case of / E ^i, one may question the use of a linear 
working model for approximating / around cIq. Instead, fitting a higher order 
polynomial working model may seem more appropriate. We examined the 
case of /i using a quadratic working model. The results show that this model 
improves the mean squared error of the estimates when the noise is large, 
but leads to substantial under cover age. 

Remark 4.7. Our simulation results indicate that good choices for p 
are 0.5 for /i and 0.4 for /2, respectively. Our practical recommendation is 
p = 0.5, whenever no prior information about the linearity of / around do is 
available. 

5. Data application. We apply our methods to the engineering problem 
introduced at the beginning of this paper. We briefly describe the underlying 
system next: consider a complex queueing system comprising N first-in-first- 
out infinite capacity queues holding different classes of customers and a set of 
service resources. These resources are externally modulated by a stochastic 
process. The main issue is to allocate the available resources to the queue 
in an appropriate manner so as to maximize the system's throughput. This 
system represents a canonical model for wireless data/voice transmissions, 
in flexible manufacturing and in call centers (for more details, see [2]). 

An important quality of service metric is the average delay of jobs (over 
all classes). This quantity can only be obtained through simulation of the 
system, due to its analytical intractability. The average delay of the jobs in 
a two-class system as a function of its loading under the optimal throughput 
policy introduced in [2] is shown in Figure 1. It can be seen that delay is, 
in general, an increasing function of the loading. The response was obtained 
by a discrete event simulation of the system for each loading, based on 2000 
events. Notice that our ability to simulate the system at any loading in order 
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POSP: Estimation of Variance Function PBTSP: Estimation of Variance Function 




0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 



loading loading 

Fig. 3. Estimation of the variance function in POSP and PBTSP. 

to obtain the response, allows us to easily implement the proposed two-stage 
procedure. 

It is of interest to estimate do = /~^(^o) for 6*0 = 10 and 15 units of delay, 
since around loadings corresponding to those levels the quality of service 
provided by the system exhibits a significant deterioration. For comparing 
the one- and two-stage procedures, we fix a budget of n = 82. A fixed design 
wth spacing 0.01 was used in the interval [0.14,0.95] to obtain the one- 
stage data shown in Figure 1 (also in the left-panel plots of Figure 4). It 
can be seen that the response is heteroskedastic, but this does not affect 
the isotonic regression based estimation of / and thus of do. However, it 
impacts the construction of confidence intervals through the estimation of 
the variance at do- To overcome this issue, the variance function is estimated 
locally by the method proposed in [27]. More specifically, we compute the 
initial local variance estimates with the weig hts (l/\/2,-l/\/2) and the 
smoothed variance function by using glkerns in the R package lokern with 
an adaptive bandwidth, shown in the left panel of Figure 3. 

When implementing the two-stage procedure, we selected every other 
point from those used in the one-stage procedure {p = 0.5), thus resulting in 
a fixed design with spacing 0.02 on the interval [0.14,0.94]. The initial local 
variance estimates and smoothed variance function with the first-stage data 
are shown in the right panel of Figure 3. After obtaining the 40 = 2 x 20 
second-stage responses, the second-stage estimator of do was computed using 
weighted least squares, with weights being the reciprocals of the estimated 
local variances at the corresponding sampling points. 

The point estimates and the associated 95% confidence intervals from the 
POSP and the PBTSP are given in Table 2 and plotted in Figure 4. It can 
be seen that the point estimates are fairly similar. More significantly, the 
confidence intervals from PBTSP are much shorter than those from POSP, 
especially for the case ^o = 10. This can be attributed to two factors: (i) 
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Table 2 
Comparing POSP and PBTSP 







POSP n = 82 


PBTSP n = 81 = 41 + 2 X 20 


61 = 10 


estimates of do 


di^' = 0.803 


di^) = 0.799 




95% CI 


[0.764,0.841] 


[0.794,0.804] 


61 = 15 


estimates of do 


dl^' = 0.863 


dL^' = 0.857 




95% CI 


[0.839,0.887] 


[0.845,0.875] 



POSP n=82 thetaO=10: point estimate and CI 



PBTSP n=41-i-2''20 thetaO=10: point estimate and CI 



0.4 0.6 
loading 



0.4 0.6 
loading 



POSP n=82 theta0=15: point estimate and CI 



PBTSP n=:41+2*20 theta0=15: point estimate and CI 
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loading 
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loading 



Fig. 4. Comparing POSP and PBTSP. 



the applicability of the linear model locally and (ii) the presence of a strong 
signal (small noise) for the design points around 0.8. 

6. Conclusions. In this study, a two-stage hybrid procedure for estimat- 
ing an inverse regression function at a given point was introduced. The 
proposed procedure, by first obtaining a nonparametric estimate of the re- 
gression function and subsequently fitting a parametric linear model in an 
appropriately shrinking neighborhood of the parameter of interest, achieves 
a y/n rate of convergence for the corresponding estimator. Note that isotonic 
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regression was used in the first stage as it works with minimal assumptions 
on the underlying monotone regression function; nevertheless, other non- 
parametric procedures could be used. Further, the local approximation was 
primarily based on a linear model, although quadratic and suitable higher- 
order approximations could be used, especially in the presence of a small 
budget of design points, since the first stage sampling interval may not be 
short enough. 

A bootstrapped version of the two-stage procedure is provided to over- 
come the difficulties posed by the requirement of estimating the derivative 
of the regression function at the unknown target point and the slow speed 
of convergence, especially with moderate sample sizes. Its asymptotic prop- 
erties are also investigated and its strong consistency established (on this 
point, see also Remark A. 2). 

Our simulation results indicate that the practical bootstrapped procedure 
performs well in a variety of settings. Note that all the plans can be equipped 
with random designs for generating the first-stage data and similar asymp- 
totic results follow. Nevertheless, for relatively small budgets, fixed designs 
(e.g., quantile based) usually yield improved performance. 

Finally, we note that the main results generalize readily to heteroskedas- 
tic models of the form Y = f[x) + a{x)£^ where (j{x) is a scaling function 
that determines the error variance. Further, the proposed procedure should 
also work for discrete response models; for example, univariate binary and 
Poisson regression models with a monotone mean function. Qualitatively, 
the results are expected to be analogous to those established in this study; 
namely, a y/n rate of convergence would be obtained for the estimator of the 
parameter of interest. However, the asymptotic behavior of the second-stage 
estimator and its bootstrap counterpart would be different and depend in 
an explicit manner on the specific model under consideration. 

APPENDIX 

In order to establish the strong consistency of the bootstrapped two-stage 
estimator, we need a rate of the almost sure convergence for the one-stage 
isotonic regression estimator dn'^ of do. The following lemma, which is the 
fixed design version of Lemma 1 in Durot (2008) [11], provides a useful tail 
probability for dn^ . 

Lemma A.l. Suppose E|e|'^ < oo for some q>2 and (A2) and (A3) 
hold. Then, there exists K > 1, depending on q, such that for every € M 
and r] > 0, 

P{\d}^^ - do\>r]) < Kinrfyi'^ . 
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Proof. It will be shown that (A2) implies 
(A.l) sup \F-\u)-G-Hx)\<n~'/\ 

u£[0,l] 

Recall that "<" denotes that the left-hand side is less than a constant times 
the right-hand side. Then, reworking the proof of Lemma 1 in Durot [11] for 
our fixed design setting and an increasing function, and replacing expression 
(13) in that lemma with (A.l) ensures that all subsequent steps go through 
yielding the desired conclusion. To show (A.l), note that from (A2), we get 
\G~^{u) — G~^{v) \ — v\ for every u,v £ [0, 1]. Then 

sup \F~\u) - G~\u)\ 

uG[0,l] 

= max{\G~HG{xi)) - G~\t/n)\,\G~\G{x,+i)) - G~Hi/n)\, 
for i = 1,2, . . . ,n - 1, \G-\G{xi))\,\G-\G{xn)) - 1|} 
< max{|G(xi) - i/n\,\G{xi+i) - i/n|, for i = 1,2, . . . ,n - 1, 

|G(xi)-0|,|G(x„)-l|} 

= sup \Fn{x) — G{x)\ 
xe[o,i] 

gives (A.l) again by (A2). □ 

With the help of Lemma A.l, next we show that n^^^ is a boundary rate 
of almost sure convergence. 

Lemma A. 2. // (A2) to (A4) hold, for each a > 0, 
P( lim n^/3""|(il^) - (iol = o) = 1. 

\n— )-oo / 

Thus, for every r < 1/3, lim„_s.oo ""-'"(dn^^ — do) = 0, (P-a.s.). 

Proof. Use the notations K, q and r] in Lemma A.l. Denote K' = 
Kr]--'"/'^ and An = {ni/3-Q|ja) _ > gy Lemma A.l, P{An) < 
j,^/^-3aq/2 £qj. gg^g]^ Q > 0. On the other hand, (A4) allows q to be arbitrar- 
ily large. Choosing q > 2/(3a) gives Ejr=i^(^n) ^ i^' Ejr=i < ^■ 
Note that > is arbitrary. Therefore, n^^'^~°'\dn^ — do I converges to al- 
most surely (see Corollary on pages 254 and 255 in Shiryaev [34]), which 
completes the proof. □ 

Remark A.l. Note that Lemmas A.l and A. 2 hold for not only se- 
quences, but also triangular arrays of design points and random errors. 
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Remark A. 2. The proof of Lemma A. 2 implies n^/^~°'{dn'' - do) 
for each a G (0,1/3) given q > 2/{3a). Then, E|e|^ < oo ensures n^{dn^ — 
do) ^ for each /? < 1/4. However, this almost sure convergence result 
actually holds under a weaker condition E|ep < oo by theorem in Makowski 
[22] and Remark 4 in Makowski [23]. This shows that it might be possible 
to weaken the assumption (A4) a little. Essentially, it means that it might 
be possible to weaken the condition on the random error in Lemma A.l. 
In fact, this possibility has been mentioned in Durot's papers on isotonic 
regression [9-11]. 

A.l. Proofs for results in Section 3.1. For the simplicity of notation, 

from now on denote 5d = dni - do, = ^'1 + ^i' ^7 = ~ ^i' fuL = /(^) + 
/(L), /^^ = /([/) - /(L), = Ru + Rl, Rul = Ru- Rl, R'i^l = K + 
R'l, Rul = R'u - R'l- Recall Y+ = ¥(' + ¥[ and = ¥(' - ¥(. 

Proof of Lemma 3.1. Consider the following Taylor expansions: 

/([/) = /(4\)+Kn7^) 

(A.2) =f^do) + ndo){5d + Kn^^) 

+ {l/2)f"{do){5d + Kn^y + Ru, [-2pt] 

f{L) = f{d^^} - Kn-^) 

(A.3) =f(^do) + f'ido)i6d-Kn-^) 

+ il/2)f"ido)iSd-Kn-'f + RL, 

where Ru = f"iCi){Sd + Kn^'f /6, Rl = f"'{^2){Sd- Kn-"'f/6, 6 lies be- 

tween do and dn^ + Kn^^ and ^2 lies between do and dn} — Kn^^ . Since 

converges to do in probability by Theorem 2.1, so do ^1 and ^2- 
Then, from (3.1), the definitions of Y.- and Y.-' and the Taylor expansions 
(A.2) and (A.3), we get 

/3i = {2Kn^^n,)-'J2Y- = {2Kn-^)-^f^^ + {2Kn-^n2)-'J2e- 

i=l i=l 

n2 

= f'{do) + f"{do)5d + {2Kn-^)-^R^^ + (2i^n^^n2)-^ J^e". 

i=l 

p 

From Theorem 2.1, (5^ — )• 0; and by the Lindeberg-Feller CLT for tri- 
angular arrays, for 7 G (0,1/2), (?^i/f^2) e^" — > 0. Next we show that 

R^^/{2Kn^"') 4 for 7 G (0, 1). Hence, for 7 G (0, 1/2) we get /3i 4 /'(do). 
It suffices to show both nlRij and njRL converge to in probability for 7 G 
(0, 1). We only show the former; the latter follows in an analogous manner. 
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(A.4) 



From the definition of Ru, we liave 

nlRu = {l/6)n'lf"'{^,){6, + Kn~y 

= (1/6)/'" (6) + 3K6j + m^n^-^d, + K'nf\ 



Tlieorem 2.1 coupled with Slutsky's lemma, shows that the sum of the four 
terms within the square bracket in (A.4) is op(l) for 7 G (0, 1). Thus, we have 
nlRu = /"'(^i)op(l). Since /'"(•) is uniformly bounded around do and ^1 — 
do in probability, /"'(^i)op(l) = op(l). This shows that njRu converges to 
in probability for 7 G (0, 1). Obviously, Ru = op(l). 
Then, for 7 G (0,1/2), 

n2 



/3o = (2n2)-i5]y+-dlV/3i 



i=l 



f{do) + {l/2)f\do)[5l + K\- 



2^-271 
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+ 



/'(do)5d + imR^L + C^n^r'^ef - dl\)/3i 



i=l 



4/(do)-do/'(4). 
Finally, for 7 G (0,1/2), the weak consistency of /3i and /?o gives (in = 
(^o-/3o)/(/3i)4do. □ 

Proof of Theorem 3.2. First, suppose / G ^i. From (3.2), the defi- 
nitions of y/ and Y" and the Taylor expansions (A. 2) and (A. 3), we get 



dl2)_do = (i//3i) 



n2 



/(do)-(2n2)-^J]y,^ 



+ 5d 



(l//'(do)) 



'12 



/(do)-(2n2)-i^y,^ 



+ (/'((io)/3i)"'(/'(do)-A) 
= 5*1 + S'2 X 5*3, 

where 

Si = -/"(do)(2/'(do))-'(5^ + K^nf^) 

i=l 
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f{do)-{2n,r'^Y+ 



i=l 
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52 = if'ido)Pir 

53 = f'{do)5, + {l/2)f"{do){6j + K'nf^) + (l/2)i?+^ + (2n2)-^ 

1=1 

Next, consider the exact stochastic orders of the terms 81,82 and 83. 
We start with 81. From Theorem 2.1, 5^ = Op(n~^/'^); for 7 > 0, n^'^'' = 
Op{n-^^), Ru = Op(n-i) + Op{n-^^), Rl = Op(n-i) + Op{n-^^) and 
n^^ Sr=i ^i*" ~ Op(n~^/^). Note that these are the exact rates of weak con- 
vergence. Then, for 7 G (0, 1/2), Si = Ti + r2 + op{n"^'f V n-^/^), where 

n2 

Ti = -(2/'(do))"V"(do)i^'n-2^ T2 = -(2/'(do)n2)-^ 

i=l 

Thus, the possible main terms of 81 are Ti and T2. In the same way, we can 
obtain the main terms of «S'2 and 83 and then those of 82 x 83. Finahy, we 
have 5*1 + ^2 X Ss = Ti + T2 + T3 + R, where 

n = (2ir/3in-^n2)-^<5rf^er, R = op{n-^"' V n'^/^ ^ ^^"5/6). 

i=l 

It is easy to see that among the three rates n~'^"' , n~^^'^ and n'*'"^/^, the 
first, second or last one is slowest according as 7 belongs to the interval 
(1,1/4), (1/4,1/3), or (1/3,1/2), respectively; the first and the second are 
the slowest for 7 = 1/4; while the second and the last ones are the slowest 
for 7=1/3. In other words, Ti, T2 or becomes the main term according 
as 7 G (0, 1/4), 7 G (1/4, 1/3) or 7 G (1/3, 1/2), respectively. When 7=1/4, 
both Ti and T2 become the main terms and when 7 = 1/3, both T2 and 
become the main terms. 

Then, by Theorem 2.1, the Lindeberg-Feller CLT for triangular arrays, 
Slutsky's lemma and the Continuous Mapping theorem, and noting that 

ny^6d is independent of Y17=i ^'^^ ^^2 SSi ^7 ^^"^ t^Sit ef is 

uncorrelated with e~ , we obtain the results of the five cases for / G ^1 
defined by the different ranges of 7 in the statement of the theorem. 

For the purpose of illustration, we outline the case 7 = 1/3, for which T2 + 
T3 is the main term with exact stochastic order Op(n"^/^). Thus n^^'^{dn^ — 
do) and n^^'^{T2 + 73) have the same asymptotic distribution. Since 

(712 "2 \ 

nJ/=^5,,n-^/'^e+,n-^/'^6r pi (CZ, cZi, 0^2), 
i=l i=l J 



"2 

f"{do)6d + i2Knr'r'RijL + (2Kn-V)"i J^er , 

i=l 
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where Z follows Chernoff distribution, independent of Zi, Z2 which are i.i.d. 
A^(0, 1), and c= \/2o", by Continuous Mapping theorem, we have 

^1/2(^2 + Tg) A -C2ZX + (1/K)C72CZZ2. 

Note that — C2Z1 can be replaced by CiZ\ since A^(0, 1) and — A^(0, 1) have 
the same distribution. In similar fashion, we obtain the asymptotic results 
for the other four cases. 

Carefully examining the above proof reveals that the conclusions with 
7 S (1/4, 1/2) also hold for / G ^ . Thus, it remains to show the cases / G 
and 7 G (1/8,1/4]. 

For / G ^2) consider the following Taylor expansions: 

/(f/) = MV+^nr) 

(A.5) =/(do) + /'(do)(5, + An^^) 

+ (l/6)r(do)(5d + /W^)=^ + «'f/, 
/(L) = /(dW-i^nr^) 

(A.6) =/(do) + /'(do)(5d-/W^) 

+ (l/6)r(4)(5d-A'n-^)=^ + /2i, 

where = /(4)(ei)(<5d + An-^)V24, fi^ = /(4)(6)(<5d - ^n-^)V24, 6 lies 
between do and $nl + Kn^'^ and ^2 lies between do and — Kn^"^ . 
Then, for 7 G (1/8, 1/2), 



dl2)_4 = (i//3i) 



n2 



/(do)-(2n2)-i^y,^ 



(l//'(4)) 



"2 



/(do)-(2n2)^i5]y,^ 



+ (/('io)/3i)-'(/'(4)-/3i) 

51+52X^3, 



"2 



/(do)-(2n2)-^j;y+ 



i=l 



where 



Si = -{6f'{do))-'f"'{do)6l - {2f{do))-'f"'{do)5dK^nf'^ 
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- {2f'{do)r'R'^r. - (2/'(do)n2)-^5^e+, 



i=l 
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^2 = (/'(do)/3i)- 

n2 

+ {2Kn-^)-'R'^^ + {2Kn-^n2)-'J2'^ ' 

i=l 

^3 = < / ((io)5d + — Q — 5d + — ^ — SdK +2^UL + ^Z^^^Tj- 

Similar to the previous argument on the exact weak convergence rates, 
S1 + S2X S3 = Ti + T2 + R', where 

n2 n2 

Ti = -i2f'{do)n2)-'J2et, = (l//3i)J,(2Kn^^n2)-^ J^e", 

i=l i=l 

and i?' is the sum of the remaining terms which converges to faster than 
Ti and T2. Then Ti becomes the main term for 7 G (1/8, 1/3) and the weak 
convergence result for / € and 7 S (1/8,1/4] follows easily from the 
Lindeberg-Feller central limit theorem for triangular arrays and Slutsky's 
lemma. This completes the proof. □ 

A. 2. Proofs for results in Section 3.2. To simplify arguments, we intro- 
duce a notation on the rate of almost sure convergence. Suppose {Cn} is 
a sequence of random variables and 5 G M. Write Cn = Bas{b) if ?^"Cn con- 
verges to almost surely for every a < 6. It is easy to verify that Bas{hi) + 
Bas{h2) = Basih) and Bas{bi)Bas{b2) = Basih + 62) if h<b2e R. Note 
that Cn = Bas{b) for some 6 > implies Cn ~^ almost surely. Denote Vj^ = 
£*+ = e'l* + and V' = e*' = ef - e'*. Recall Y*+ = + and 
V*— _ Yfi-k _ Y/* 
i i i ■ 

Proof of Lemma 3.3. The proof of Lemma 3.1 establishes the weak 
consistency of /3i for the case 7 S (0,1/2). In fact, under the setting of 
the bootstrapped two-stage procedure, the strong consistency of $1 can be 
obtained. 

From the proof of Lemma 3.1, it suffices to show S^, {nl/n2) and 
R^j^/{2Kn^'^') converge to almost surely. Lemma A. 2 shows that dd con- 
verges to almost surely, while Lemma A. 5 establishes that (71^/722) Yll=i ^7 
converges to almost surely for 7 G (0, 1/2). Thus, it suffices to show that 
both njRu and njRL converge to almost surely for 7 G (0,1). Next, we 
show the former; the latter follows analogously. 

Since ^1 lies between do and dn^ + Kn^^ and the latter converges to 
do almost surely, we know ^1 converges to do almost surely. On the other 



{l/2)f"{d,)5l + {l/Q)f"\d^)K^n- 
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hand, /"'(•) is uniformly bounded around do; thus, /"'(Ci) is almost surely 
bounded. Further, by Lemma A. 2, the four terms within square brackets 
on the right-hand side of (A. 4) are Bas{l — 7), i?as(2/3), Sas(l/3 + 7) and 
Bas(2^)- Thus, njRu almost surely converges to for 7 G (0, 1). 
So, for 7 G (0, 1/2), we have /3i f'{do), (P-a.s.). 

Next, we establish the conditional weak consistency of (3^ for f £ From 
(3.4), we get 

n2 
i=l 

where 

Ti = {2Kn-^n2)-'Y.^t. T, = {2Knr')-' fuL- 
1=1 

Hence, we have Ti = Tn + T12, where 

n2 n2 
i=l i=l 

Vr=e*^-,u- = E4V-] = (1^) Erii and 

i=l \ i=l / i=l \ i=l / 

P* 

For 7 G (0, 1/2), gives that T12 — )■ 0, (P-a.s.) by Lemma A. 5 and Tn — 0, 

p* 

(P-a.s.) by Lemma A. 6 and Slutsky's lemma. Thus, Ti — )• 0, (P-a.s.). 

Next, we consider T2. By the almost sure convergence of 5d and R^^/ 
{2Kn^'^), we have, for 7 G (0, 1), 

T2 = /'(do) + f"ido)5d + {2Kn~^r^R-^ ^ /'(do), (P-a.s.). 

Thus, for / G =^ and 7 G (0,1/2), Ta ^ f'{do),{P-a..s.). Therefore, we get 
/3^^/'(do),(^'-a.s.). □ 

Proof of Theorem 3.4. From (3.2) and (3.3), 
where 

n2 

Ti = (/'(do)2n2)-^.i/2 J](y/+ - y+), 
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/ "2 \ 



/ n2 N 

- (l//3i - l//'(do)) /(do) - (2n2)-i J]y+ 



By the definitions of F/, F/', F/*, F/'*, 



i=l 



i=i i=i V ^ 



= nV2(/'(4)2n2)-^ - e+) = .nV2(2/'(do)nV2)-i ^ 

where 



n2 



i=l 

and ■S'^ = Var^[V^"*'], equal to that in the proof of Lemma 3.3. 

Lemma A. 4 gives — )■ 2(7^, (P-a.s.) and Lemma A. 6 gives ^^=1^1' ~ 

^t)/i^V^) ^ Zi, (P-a.s.). Note that a/2/(1 -p). Thus, Slut- 

sky's lemma implies 

In Lemma A. 3, following this proof, we show that for 7 S (0, 1/3), T2 — )• 0, 
(P-a.s.). Therefore, another application of Slutsky's lemma completes the 
proof. □ 

Lemma A. 3. For f e ^ and 7 e (0, 1/3), T2 ^ 0, (P-a.s.). 
Proof. Let 

/ = /3i - /'(do), // = /(do) - (2n2)-i 5] y+, 

i=l 

A = Pt-L B = (2n2)-'J2('':+-et), 

i=l 

T21 = n^l'^A -I -11, T22 = n^l'^l ■ B, 
T23 = n^/'^II • A, T2A = n^l'^A ■ B. 

Then 

T2 = n^'\-(Plf'(d,))-\l + A] ■ [II -B] + (Pif'(do))-'l ■ II} 
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= {Mif'{do)r'n'/^A.I-II 

- 0tf'{do)y^ [-n^/'^I • B + n^/^II • A - n^/^A • B] 

= {PJtf'ido)r^T2l - Wtf'ido)r'[-T22+T23-T24]. 

It will be shown that Tgi ^ 0, (P-a.s.), i = 1, 2, 3, 4 for 7 G (0, 1/3). Thus, 
by Lemma 3.3 and Slutsky' lemma, the conclusion of this lemma holds. 

We establish next the convergence of the terms T2j. From (3.1), (3.4), the 
definitions of Y- , , ¥■* and y/'*, and the Taylor expansions of f{L) and 
/([/) [(A.2) and (A.3)], we have 

n2 n2 

A = {2Kn2)-'nJ J^i^^ -^7) = {2Kny'r'nlsn-^"Y.^Vr - u~)/s, 

i=l i=l 

n2 ri'z 

B = {2n2)-' - 4) = i^nl^Y'sn,'/' Y.^V^'' " ^^^'^ 

i=l 1=1 

1 = ^1- /'(do) = f"{do)6d + {2K)-^njR^^ + {2Kn2)-^nJ 

i=l 

ri2 

II = f{do)-{2n2)-'J2Y+ 
i=l 

n2 

= -ndo)Sd - {l/2)f"{do){6j + K^nf^) - (l/2)i?+^ - (1^)5^6+ 

i=l 

First, consider T21. We have 

T21 = n'/'A ■1.11 = T!2,sn^^'^ Y.{Vr - v-)/s, 

i=l 

where T^^ = • / • // and C„ = n^/'^n:i{2Kny\^ . Lemmas A.4 and A.6 
give 
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s^V2a, (P-a.s.), n^^'^"Y{Vr - v~) / s ^ Z2, (P-a.s.). 

i=l 

Next, it will be shown that T.21 converges to P-almost surely for 7 G 

P* 

(0,5/12). Then, an application of Slutsky's lemma gives T21 — )• 0, (P-a.s.). 

With the notation introduced at the beginning of this subsection and by 
Lemmas A.2 and A. 5, we have, for 7 > 0, nl = Basi—j), (5^) = i?as (1/3), 
E£i(^f + ed/n2 = BUim and " e[)/n2 = 5^.(1/2). Both Ru 

and Rl are equal to S^, (1) + B^s (7 + 2/3) + B^s (27 + 1/3) + Bas (37) • Thus 
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we have Cn = Bas{-l), I = Sas(l/3) + Bas{-l)[Bas{l) + Basil + 2/3) + 
Bas{2i + 1/3) + Basi-il)] + Bas{l/2 - 7) = 5,,(l/3) + Bas{2l) + S„,(l/2 - 
7) and // = Bas{l/2,) + [B,,(2/3) + Bas{2l)] + (5a.(l) + Basil + 2/3) + 

Bas{2i + 1/3) + Basi'il)) + Bas{l/2) = 5,,(l/3) + Bas{2i). Thus, for 7 G 
(0,1/2), 

T^i = a • / • // 

= Bas{-l) X [5as(l/3) + {Bas{2l)) + S,,(l/2 - 7)] 

X {Bas{l/^)+Bas{2-^)] 
= Basi2/3 - 7) + Basil/3 + 7) + 5a. (5/6 - 27) + ^,,(37). 

It is easy to see that when 7 G (0,5/12), the above upper bounds 1/2 — 7, 
1/4+7, 3/4 — 27, and 37 are all positive. This implies that converges to 
P-almost surely for 7 G (0,5/12). Therefore, for 7 G (0,5/12), T21 converges 
to in probability (P-a.s.). 

Similarly, we can show that T2i, z = 2, 3 or 4, converges to in probability 
(P-a.s.), but with different intervals for 7. We next list these results. For 
7 G (0,1/2), T22 and converge to in probability (P-a.s.) and for 7 G 
(0,1/3), T23 converges to in probability (P-a.s.). It is worthwhile to note 
that ^ can be considered directly because the Pas (1/3 — 7) term in T23 
does not depend on /"(do). Since 1/3 < 5/12 < 1/2, T2i converges to in 
probability (P-a.s.) for i = 1,2,3,4 and 7 G (0,1/3). Thus, for / G ^ and 
7 G (0, 1/3), T2 converges to in probability (P-a.s.). □ 

Proof of Theorem 3.5. Consider < 7 < 1/3. Given an arbitrary 
subsequence {n^}^^ of {n}^^^, let ni = np and n/c 1 = Uf^p. By Theorem 2.1, 

we know that n]'(5rf) = {np)'^{dnp — do) 0. It follows, by the relationship 
between convergence in probability and almost sure convergence (e.g., see 
Theorem 20.5 in Billingsley [7]), that there exists {^^(j)}^]^, a further sub- 
sequence of {nfc}, such that n^^^^ lidn^i^i-, 1 — do) — ^ 0, (P-a.s.). It now suffices 
to show that 

Let nfc(j)_2 = nk{i){l -p)/2. Write Cnfc(,) = Bas{b) if n^(i)Cnfc(,) converges 
to almost surely for every a <b. As in the proof of Theorem 3.4, write 
n\^^'^{dn^*i) ~ dn^i^i-)) as — Ti +T2, where both T\ and T2 are now indexed by 
nf,(j^. It is then not difficult to show that the conditional distribution of Ti 
converges to that of C2Z1 P-almost-surely by replacing n, ni and n2 by 
nfc(j), nfc(j) 1 and nfc(j) 2^ respectively, and essentially repeating the steps in 
Theorem 3.4. 
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It remains to show that r2 — t- (P-a.s.). The proof of this follows from 
that of Lemma A. 3 by replacing n, ni and 712 by n^.(j), nfc(j)^i and rafc(j) 2; 

respectively, and noting that dn^i^ i — do = Pas (1/3). □ 

A. 3. Some auxiliary lemmas. First, we state a special almost sure con- 
vergence result on a triangular array of i.i.d. mean zero random variables. 
For the general result, see Proposition in Hu, Moricz and Taylor [18]. 

Lemma A. 4. If a triangular array of random variables {Xni}^\ for n G 
N are i.i.d. copies of a mean random variable X with rUn increases to 00 as 

n goes to 00 and E\X\'^p < 00 for some p G [1,2), i-*(lim„_>.oo ?7Z„ Si^l Xni = 
0) = 1. 

Suppose a triangular array of random variables {£ni}^i for G N are i.i.d. 
copies of £ with mean 0, where m„ increases to 00 as n goes to 00. Then 
Lemma A. 4 tells that e„ = (l/m„) ^^"^ e„i and (l/m„) ^^"^ e^- converge 
to and cr^ almost surely given Ee^ < 00 and Ee^ < 00, respectively. Further, 
the following lemma shows that n^^"^ is an upper boundary rate of the almost 
sure convergence of En- 

Lemma A. 5. IfEe^ < 00, P(lim.„^oo?7T-"e„ = 0) = 1 for each a < 1/2. 

Proof. A direct application of Lemma A. 4 gives that if E|e|2p <oo for 

some p G [1)2), P(lim.„^oo "in ^^^£n = 0) = 1. On the other hand, Ee^ < 00 
implies that E|e|^P < 00 for every p G [1,2). Thus, the conclusion follows. □ 

Suppose {e'j}f^i, {e'-}^^^, {e'*}^^^ and {e'-*}f^i are the second-stage ran- 
dom errors and the corresponding bootstrapped ones defined in Section 3.2. 
Note that the subscripts of these random variables indicating the sample 
size are suppressed for the simplicity of notation and that here "n" is under- 
stood as a dummy variable, not the total sample size. Recall V^^ = £•'* + e'*, 
= £'*[yj"'"], V[~ = e'l* — e'* and v~ = £'*[1^~], where E^, means the expec- 
tation conditioning on the second-stage data. Since Var^[V^'^] =Var^[l^~], 
we denote both as s^. The following lemma shows that both and V~ 
are asymptotically normal P-almost surely. 

Lemma A. 6. //Ee^ <oo, we have 

— > ^ (P-a.s.), 

s 

1=1 

-^y^- yZ, (P-a.s.), 

1=1 

where Z follows a N(0, 1) distribution. 



TWO-STAGE ESTIMATION OF INVERSE REGRESSION FUNCTION 



33 



Proof. We only prove the former and the latter can be shown similarly. 
Let = (Vi'^ - i'~^)/iV^s), for i = 1, 2, . . . , n, and Sn = Y17=i ^ni- It is easy 
to see that -E'*[^ni] = and Var^[5.„] = 1. Thus, it suffices to check that the 
following Lindeberg condition holds for each 77 > (see, e.g.. Theorem 2 on 
page 334 of Shiryaev [34]): E^ClAlCml > r]}] 0, (P-a.s.). Note that 

n 

Y.E*i^ni{\U > V}] = E.iiiV^ - '^^)/s]'{\{V,+ - U+)/s\ > V^T^}) 
i 

< {y/nr])~^\s\-''^E^\V^+ - u+f, 



i=l \ i=l / i=l \ 1=1 



then it is sufficient to show limn^ooEi,\V^ — < 00, (P-a.s.). Since 

E^\V^+ -ly+f < E4\V^+f + \i^+f + 3\V^+\^\u+\+3\V^+\\i^+\^] 

= E^\v+\^ + 3|i/+|^^|y+|2 + 3|i/+|2^*|y+| + 

and 1^+ = ^ E£i(ei' + ei) 0' (-P-a.s.), it suffices to show Iim.„^oo-E*|^i^|'' < 
00, (P-a.s.), for A; = 1, 2, 3. We only need to show the case where k = 3. From 
(a + b)^ < 4(a^ + 6^) for nonnegative a and b, 



n 



j=i 1=1 \ i=i i=i / 



By Lemma A. 4, both (1/n) ^11=1 ki'P ^^^"^ Z^iLi kil^ converges almost 
surely under the assumption Ee^ < 00. Therefore, lim„_>.oo-£'vc|^i^P < 00, 
(P-a.s.), which completes the proof. □ 
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